-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance] Faster split, chunk and unbind #563
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 33.6610μs | 12.5175μs | 79.8879 KOps/s | 79.4391 KOps/s | |
test_plain_set_stack_nested | 0.1403ms | 0.1167ms | 8.5675 KOps/s | 8.5356 KOps/s | |
test_plain_set_nested_inplace | 33.1110μs | 14.8945μs | 67.1389 KOps/s | 66.5346 KOps/s | |
test_plain_set_stack_nested_inplace | 0.1731ms | 0.1397ms | 7.1575 KOps/s | 7.0585 KOps/s | |
test_items | 23.4510μs | 4.6897μs | 213.2336 KOps/s | 212.0739 KOps/s | |
test_items_nested | 0.4380ms | 0.3380ms | 2.9587 KOps/s | 2.9765 KOps/s | |
test_items_nested_locked | 0.3700ms | 0.3370ms | 2.9674 KOps/s | 2.9792 KOps/s | |
test_items_nested_leaf | 0.2251ms | 0.1993ms | 5.0173 KOps/s | 5.0583 KOps/s | |
test_items_stack_nested | 1.5176ms | 1.4409ms | 694.0186 Ops/s | 681.2262 Ops/s | |
test_items_stack_nested_leaf | 1.3324ms | 1.2764ms | 783.4259 Ops/s | 776.1847 Ops/s | |
test_items_stack_nested_locked | 0.8573ms | 0.7990ms | 1.2515 KOps/s | 1.2144 KOps/s | |
test_keys | 22.1900μs | 4.7832μs | 209.0662 KOps/s | 218.8575 KOps/s | |
test_keys_nested | 2.1324ms | 91.2871μs | 10.9545 KOps/s | 10.9924 KOps/s | |
test_keys_nested_locked | 0.1518ms | 90.5441μs | 11.0443 KOps/s | 11.1087 KOps/s | |
test_keys_nested_leaf | 43.1521ms | 86.6093μs | 11.5461 KOps/s | 12.2827 KOps/s | |
test_keys_stack_nested | 1.3257ms | 1.2675ms | 788.9477 Ops/s | 789.8717 Ops/s | |
test_keys_stack_nested_leaf | 1.3055ms | 1.2498ms | 800.1033 Ops/s | 791.3357 Ops/s | |
test_keys_stack_nested_locked | 0.6658ms | 0.6140ms | 1.6288 KOps/s | 1.6201 KOps/s | |
test_values | 10.0000μs | 1.8812μs | 531.5763 KOps/s | 528.9723 KOps/s | |
test_values_nested | 67.4410μs | 42.9906μs | 23.2609 KOps/s | 23.1716 KOps/s | |
test_values_nested_locked | 66.8710μs | 42.8514μs | 23.3364 KOps/s | 23.1949 KOps/s | |
test_values_nested_leaf | 58.8610μs | 37.1733μs | 26.9011 KOps/s | 26.6444 KOps/s | |
test_values_stack_nested | 1.1667ms | 1.0963ms | 912.1971 Ops/s | 894.0568 Ops/s | |
test_values_stack_nested_leaf | 1.3514ms | 1.0866ms | 920.3333 Ops/s | 909.1831 Ops/s | |
test_values_stack_nested_locked | 0.5327ms | 0.4769ms | 2.0969 KOps/s | 2.0620 KOps/s | |
test_membership | 3.9383μs | 0.9223μs | 1.0843 MOps/s | 1.0708 MOps/s | |
test_membership_nested | 36.1910μs | 2.2041μs | 453.7007 KOps/s | 446.3732 KOps/s | |
test_membership_nested_leaf | 33.0255μs | 2.0867μs | 479.2270 KOps/s | 463.7395 KOps/s | |
test_membership_stacked_nested | 37.2710μs | 10.6214μs | 94.1500 KOps/s | 92.7750 KOps/s | |
test_membership_stacked_nested_leaf | 43.7110μs | 10.9539μs | 91.2915 KOps/s | 93.1035 KOps/s | |
test_membership_nested_last | 19.4710μs | 4.6215μs | 216.3823 KOps/s | 216.9041 KOps/s | |
test_membership_nested_leaf_last | 36.6310μs | 4.6497μs | 215.0670 KOps/s | 218.4416 KOps/s | |
test_membership_stacked_nested_last | 0.1796ms | 0.1337ms | 7.4778 KOps/s | 7.4937 KOps/s | |
test_membership_stacked_nested_leaf_last | 72.8920μs | 12.6294μs | 79.1804 KOps/s | 78.3065 KOps/s | |
test_nested_getleaf | 35.8410μs | 8.4265μs | 118.6731 KOps/s | 119.5739 KOps/s | |
test_nested_get | 30.2700μs | 7.9268μs | 126.1546 KOps/s | 126.0298 KOps/s | |
test_stacked_getleaf | 0.7861ms | 0.5522ms | 1.8110 KOps/s | 1.8384 KOps/s | |
test_stacked_get | 0.5675ms | 0.5198ms | 1.9239 KOps/s | 1.9630 KOps/s | |
test_nested_getitemleaf | 71.1810μs | 8.4061μs | 118.9613 KOps/s | 118.8258 KOps/s | |
test_nested_getitem | 32.5500μs | 7.9580μs | 125.6603 KOps/s | 125.5521 KOps/s | |
test_stacked_getitemleaf | 0.7634ms | 0.5498ms | 1.8187 KOps/s | 1.8589 KOps/s | |
test_stacked_getitem | 0.5698ms | 0.5196ms | 1.9246 KOps/s | 1.9418 KOps/s | |
test_lock_nested | 4.4341ms | 0.4532ms | 2.2064 KOps/s | 2.7243 KOps/s | |
test_lock_stack_nested | 72.7933ms | 6.6375ms | 150.6583 Ops/s | 194.2667 Ops/s | |
test_unlock_nested | 1.2895ms | 0.4275ms | 2.3390 KOps/s | 2.4913 KOps/s | |
test_unlock_stack_nested | 69.2938ms | 7.3690ms | 135.7041 Ops/s | 165.9123 Ops/s | |
test_flatten_speed | 0.5239ms | 0.1895ms | 5.2768 KOps/s | 5.2724 KOps/s | |
test_unflatten_speed | 0.4053ms | 0.3722ms | 2.6867 KOps/s | 2.6929 KOps/s | |
test_common_ops | 1.0638ms | 0.6085ms | 1.6433 KOps/s | 1.6005 KOps/s | |
test_creation | 37.4700μs | 1.9440μs | 514.4116 KOps/s | 516.3511 KOps/s | |
test_creation_empty | 36.9500μs | 6.9358μs | 144.1788 KOps/s | 142.9254 KOps/s | |
test_creation_nested_1 | 24.8300μs | 9.6697μs | 103.4155 KOps/s | 103.4122 KOps/s | |
test_creation_nested_2 | 32.0710μs | 12.1948μs | 82.0023 KOps/s | 82.0757 KOps/s | |
test_clone | 96.7820μs | 14.1987μs | 70.4291 KOps/s | 67.3505 KOps/s | |
test_getitem[int] | 28.1100μs | 12.1707μs | 82.1644 KOps/s | 79.0210 KOps/s | |
test_getitem[slice_int] | 43.1010μs | 23.6418μs | 42.2980 KOps/s | 34.9642 KOps/s | |
test_getitem[range] | 65.6310μs | 39.4150μs | 25.3711 KOps/s | 20.5263 KOps/s | |
test_getitem[tuple] | 38.8200μs | 20.3664μs | 49.1004 KOps/s | 38.5962 KOps/s | |
test_getitem[list] | 0.2955ms | 36.2402μs | 27.5937 KOps/s | 21.4926 KOps/s | |
test_setitem_dim[int] | 45.7710μs | 26.0464μs | 38.3930 KOps/s | 36.0922 KOps/s | |
test_setitem_dim[slice_int] | 65.3620μs | 46.0694μs | 21.7064 KOps/s | 21.1707 KOps/s | |
test_setitem_dim[range] | 94.9610μs | 62.4814μs | 16.0048 KOps/s | 15.5581 KOps/s | |
test_setitem_dim[tuple] | 60.7710μs | 39.2350μs | 25.4874 KOps/s | 24.0508 KOps/s | |
test_setitem | 0.1282ms | 18.1054μs | 55.2323 KOps/s | 53.5904 KOps/s | |
test_set | 0.1182ms | 17.7211μs | 56.4301 KOps/s | 55.5688 KOps/s | |
test_set_shared | 2.8048ms | 0.1007ms | 9.9308 KOps/s | 9.6422 KOps/s | |
test_update | 0.1025ms | 21.9617μs | 45.5339 KOps/s | 44.4343 KOps/s | |
test_update_nested | 0.1155ms | 31.3244μs | 31.9240 KOps/s | 31.4468 KOps/s | |
test_set_nested | 0.1022ms | 18.6411μs | 53.6450 KOps/s | 51.1353 KOps/s | |
test_set_nested_new | 0.1130ms | 23.7239μs | 42.1517 KOps/s | 41.4113 KOps/s | |
test_select | 0.1193ms | 46.2195μs | 21.6359 KOps/s | 21.7670 KOps/s | |
test_to | 76.4420μs | 52.9571μs | 18.8832 KOps/s | 18.3328 KOps/s | |
test_to_nonblocking | 74.0010μs | 35.2038μs | 28.4060 KOps/s | 27.4869 KOps/s | |
test_unbind_speed | 0.4744ms | 0.3447ms | 2.9007 KOps/s | 3.7381 KOps/s | |
test_unbind_speed_stack0 | 62.1445ms | 5.1957ms | 192.4684 Ops/s | 267.0491 Ops/s | |
test_unbind_speed_stack1 | 3.3485μs | 0.5218μs | 1.9165 MOps/s | 1.8705 MOps/s | |
test_split | 54.1639ms | 1.7984ms | 556.0405 Ops/s | 348.1993 Ops/s | |
test_chunk | 54.5577ms | 1.7831ms | 560.8108 Ops/s | 355.8527 Ops/s | |
test_creation[device0] | 0.3976ms | 0.3096ms | 3.2303 KOps/s | 3.1875 KOps/s | |
test_creation[device1] | 0.4783ms | 0.3121ms | 3.2042 KOps/s | 3.1664 KOps/s | |
test_creation_from_tensor | 0.5908ms | 0.3395ms | 2.9459 KOps/s | 2.9425 KOps/s | |
test_add_one[memmap_tensor0] | 68.3110μs | 24.1442μs | 41.4178 KOps/s | 38.7398 KOps/s | |
test_add_one[memmap_tensor1] | 0.2194ms | 74.3071μs | 13.4577 KOps/s | 13.2923 KOps/s | |
test_contiguous[memmap_tensor0] | 21.6710μs | 5.7946μs | 172.5737 KOps/s | 163.8283 KOps/s | |
test_contiguous[memmap_tensor1] | 49.7910μs | 22.3540μs | 44.7347 KOps/s | 43.6534 KOps/s | |
test_stack[memmap_tensor0] | 48.9200μs | 20.0195μs | 49.9512 KOps/s | 48.1817 KOps/s | |
test_stack[memmap_tensor1] | 0.1541ms | 73.0036μs | 13.6980 KOps/s | 13.1806 KOps/s | |
test_memmaptd_index | 0.2634ms | 0.2250ms | 4.4450 KOps/s | 4.2562 KOps/s | |
test_memmaptd_index_astensor | 0.3208ms | 0.2840ms | 3.5216 KOps/s | 3.1810 KOps/s | |
test_memmaptd_index_op | 0.6126ms | 0.5541ms | 1.8047 KOps/s | 1.7074 KOps/s | |
test_reshape_pytree | 0.2549ms | 20.9664μs | 47.6954 KOps/s | 46.1799 KOps/s | |
test_reshape_td | 64.1910μs | 31.0007μs | 32.2573 KOps/s | 31.4152 KOps/s | |
test_view_pytree | 39.3410μs | 20.6280μs | 48.4778 KOps/s | 46.8664 KOps/s | |
test_view_td | 25.0800μs | 4.0748μs | 245.4115 KOps/s | 247.7869 KOps/s | |
test_unbind_pytree | 53.2910μs | 25.9424μs | 38.5469 KOps/s | 37.4126 KOps/s | |
test_unbind_td | 83.8410μs | 55.9009μs | 17.8888 KOps/s | 24.5093 KOps/s | |
test_split_pytree | 38.3110μs | 23.3933μs | 42.7472 KOps/s | 41.1151 KOps/s | |
test_split_td | 67.4510μs | 42.7063μs | 23.4157 KOps/s | 14.7408 KOps/s | |
test_add_pytree | 0.1029ms | 32.1250μs | 31.1284 KOps/s | 30.2911 KOps/s | |
test_add_td | 67.9910μs | 44.0228μs | 22.7155 KOps/s | 21.2645 KOps/s | |
test_distributed | 25.9300μs | 5.4703μs | 182.8051 KOps/s | 182.3007 KOps/s | |
test_tdmodule | 1.7622ms | 18.1932μs | 54.9657 KOps/s | 59.5932 KOps/s | |
test_tdmodule_dispatch | 0.1855ms | 32.9261μs | 30.3710 KOps/s | 30.4001 KOps/s | |
test_tdseq | 35.5410μs | 19.6712μs | 50.8357 KOps/s | 49.7514 KOps/s | |
test_tdseq_dispatch | 58.0210μs | 35.6586μs | 28.0437 KOps/s | 27.9625 KOps/s | |
test_instantiation_functorch | 1.7459ms | 1.6657ms | 600.3397 Ops/s | 588.4496 Ops/s | |
test_instantiation_td | 1.7740ms | 1.1704ms | 854.4074 Ops/s | 849.6794 Ops/s | |
test_exec_functorch | 0.2249ms | 0.1578ms | 6.3377 KOps/s | 6.1359 KOps/s | |
test_exec_td | 0.2252ms | 0.1487ms | 6.7228 KOps/s | 6.5616 KOps/s | |
test_vmap_mlp_speed[True-True] | 1.1356ms | 1.0854ms | 921.3252 Ops/s | 929.2716 Ops/s | |
test_vmap_mlp_speed[True-False] | 0.6956ms | 0.6258ms | 1.5979 KOps/s | 1.6048 KOps/s | |
test_vmap_mlp_speed[False-True] | 1.0469ms | 0.9969ms | 1.0031 KOps/s | 1.0090 KOps/s | |
test_vmap_mlp_speed[False-False] | 0.7063ms | 0.5589ms | 1.7893 KOps/s | 1.7710 KOps/s | |
test_vmap_transformer_speed[True-True] | 12.8072ms | 12.6555ms | 79.0167 Ops/s | 77.8207 Ops/s | |
test_vmap_transformer_speed[True-False] | 8.4560ms | 8.3770ms | 119.3748 Ops/s | 119.9304 Ops/s | |
test_vmap_transformer_speed[False-True] | 12.7538ms | 12.6316ms | 79.1668 Ops/s | 79.4065 Ops/s | |
test_vmap_transformer_speed[False-False] | 8.3672ms | 8.3070ms | 120.3801 Ops/s | 121.2295 Ops/s |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 29.4450μs | 16.4963μs | 60.6198 KOps/s | 60.2416 KOps/s | |
test_plain_set_stack_nested | 0.2066ms | 0.1512ms | 6.6128 KOps/s | 6.7255 KOps/s | |
test_plain_set_nested_inplace | 41.5780μs | 19.4627μs | 51.3803 KOps/s | 51.8603 KOps/s | |
test_plain_set_stack_nested_inplace | 0.3429ms | 0.1773ms | 5.6416 KOps/s | 5.6978 KOps/s | |
test_items | 20.9090μs | 2.4024μs | 416.2581 KOps/s | 365.2251 KOps/s | |
test_items_nested | 0.5016ms | 0.2715ms | 3.6832 KOps/s | 3.7454 KOps/s | |
test_items_nested_locked | 1.0790ms | 0.2719ms | 3.6783 KOps/s | 3.7553 KOps/s | |
test_items_nested_leaf | 0.3192ms | 0.1649ms | 6.0651 KOps/s | 6.0511 KOps/s | |
test_items_stack_nested | 1.7411ms | 1.4366ms | 696.0915 Ops/s | 687.8216 Ops/s | |
test_items_stack_nested_leaf | 2.0609ms | 1.3095ms | 763.6699 Ops/s | 755.1914 Ops/s | |
test_items_stack_nested_locked | 1.8480ms | 0.7590ms | 1.3175 KOps/s | 1.3045 KOps/s | |
test_keys | 23.1530μs | 3.7992μs | 263.2109 KOps/s | 260.8091 KOps/s | |
test_keys_nested | 0.5315ms | 0.1408ms | 7.1002 KOps/s | 6.0885 KOps/s | |
test_keys_nested_locked | 0.2071ms | 0.1407ms | 7.1076 KOps/s | 7.2719 KOps/s | |
test_keys_nested_leaf | 0.3004ms | 0.1406ms | 7.1121 KOps/s | 7.1853 KOps/s | |
test_keys_stack_nested | 1.5541ms | 1.3807ms | 724.2662 Ops/s | 735.5281 Ops/s | |
test_keys_stack_nested_leaf | 1.5083ms | 1.3747ms | 727.4133 Ops/s | 724.8835 Ops/s | |
test_keys_stack_nested_locked | 1.1801ms | 0.6754ms | 1.4807 KOps/s | 1.4612 KOps/s | |
test_values | 5.1172μs | 1.1384μs | 878.3946 KOps/s | 844.2885 KOps/s | |
test_values_nested | 87.2730μs | 49.2701μs | 20.2963 KOps/s | 20.2073 KOps/s | |
test_values_nested_locked | 0.1080ms | 49.0445μs | 20.3896 KOps/s | 20.2569 KOps/s | |
test_values_nested_leaf | 58.3890μs | 43.7093μs | 22.8784 KOps/s | 22.6080 KOps/s | |
test_values_stack_nested | 1.9029ms | 1.1573ms | 864.0748 Ops/s | 823.5532 Ops/s | |
test_values_stack_nested_leaf | 1.8139ms | 1.1471ms | 871.7794 Ops/s | 855.2624 Ops/s | |
test_values_stack_nested_locked | 0.6453ms | 0.5055ms | 1.9784 KOps/s | 1.9120 KOps/s | |
test_membership | 16.5710μs | 1.3591μs | 735.7675 KOps/s | 669.5632 KOps/s | |
test_membership_nested | 20.8690μs | 2.8008μs | 357.0355 KOps/s | 356.2264 KOps/s | |
test_membership_nested_leaf | 26.8300μs | 2.7827μs | 359.3693 KOps/s | 358.9039 KOps/s | |
test_membership_stacked_nested | 27.4620μs | 11.6500μs | 85.8366 KOps/s | 85.2097 KOps/s | |
test_membership_stacked_nested_leaf | 40.2750μs | 11.7830μs | 84.8683 KOps/s | 85.1776 KOps/s | |
test_membership_nested_last | 34.0710μs | 5.9839μs | 167.1147 KOps/s | 169.8950 KOps/s | |
test_membership_nested_leaf_last | 23.0020μs | 5.9726μs | 167.4320 KOps/s | 169.6993 KOps/s | |
test_membership_stacked_nested_last | 0.3506ms | 0.1727ms | 5.7913 KOps/s | 5.9644 KOps/s | |
test_membership_stacked_nested_leaf_last | 34.6150μs | 13.8368μs | 72.2710 KOps/s | 73.2589 KOps/s | |
test_nested_getleaf | 29.5050μs | 10.8681μs | 92.0125 KOps/s | 93.4101 KOps/s | |
test_nested_get | 37.0090μs | 10.7388μs | 93.1198 KOps/s | 98.5246 KOps/s | |
test_stacked_getleaf | 1.0345ms | 0.6141ms | 1.6284 KOps/s | 1.6111 KOps/s | |
test_stacked_get | 1.2884ms | 0.5860ms | 1.7065 KOps/s | 1.6950 KOps/s | |
test_nested_getitemleaf | 30.9680μs | 10.6682μs | 93.7365 KOps/s | 94.4220 KOps/s | |
test_nested_getitem | 41.0760μs | 10.2695μs | 97.3759 KOps/s | 99.8283 KOps/s | |
test_stacked_getitemleaf | 0.9649ms | 0.6083ms | 1.6438 KOps/s | 1.6039 KOps/s | |
test_stacked_getitem | 0.6948ms | 0.5790ms | 1.7270 KOps/s | 1.6907 KOps/s | |
test_lock_nested | 55.1172ms | 0.5444ms | 1.8368 KOps/s | 2.6116 KOps/s | |
test_lock_stack_nested | 69.2491ms | 7.6829ms | 130.1596 Ops/s | 267.2835 Ops/s | |
test_unlock_nested | 58.4081ms | 0.5046ms | 1.9817 KOps/s | 2.5167 KOps/s | |
test_unlock_stack_nested | 63.7122ms | 7.3987ms | 135.1587 Ops/s | 169.2324 Ops/s | |
test_flatten_speed | 0.5506ms | 0.2720ms | 3.6762 KOps/s | 3.6989 KOps/s | |
test_unflatten_speed | 0.5847ms | 0.4810ms | 2.0789 KOps/s | 2.1018 KOps/s | |
test_common_ops | 4.1066ms | 0.7015ms | 1.4256 KOps/s | 1.4123 KOps/s | |
test_creation | 25.2170μs | 2.4023μs | 416.2719 KOps/s | 418.7368 KOps/s | |
test_creation_empty | 23.1930μs | 8.5691μs | 116.6985 KOps/s | 109.9100 KOps/s | |
test_creation_nested_1 | 37.0590μs | 12.4023μs | 80.6305 KOps/s | 74.6273 KOps/s | |
test_creation_nested_2 | 38.5120μs | 15.6443μs | 63.9210 KOps/s | 61.0338 KOps/s | |
test_clone | 0.1057ms | 13.2143μs | 75.6757 KOps/s | 73.5106 KOps/s | |
test_getitem[int] | 31.3490μs | 12.4660μs | 80.2181 KOps/s | 75.7262 KOps/s | |
test_getitem[slice_int] | 58.9100μs | 24.5722μs | 40.6964 KOps/s | 31.1846 KOps/s | |
test_getitem[range] | 94.2150μs | 43.6757μs | 22.8960 KOps/s | 17.7976 KOps/s | |
test_getitem[tuple] | 45.9660μs | 19.6171μs | 50.9761 KOps/s | 41.4888 KOps/s | |
test_getitem[list] | 0.2406ms | 39.0110μs | 25.6338 KOps/s | 19.6797 KOps/s | |
test_setitem_dim[int] | 68.6180μs | 27.9225μs | 35.8134 KOps/s | 34.6681 KOps/s | |
test_setitem_dim[slice_int] | 89.4670μs | 51.3014μs | 19.4926 KOps/s | 18.5518 KOps/s | |
test_setitem_dim[range] | 0.1113ms | 72.9033μs | 13.7168 KOps/s | 13.6633 KOps/s | |
test_setitem_dim[tuple] | 82.8550μs | 41.2980μs | 24.2143 KOps/s | 23.5141 KOps/s | |
test_setitem | 80.8810μs | 18.5265μs | 53.9768 KOps/s | 51.0460 KOps/s | |
test_set | 86.2510μs | 17.7731μs | 56.2647 KOps/s | 52.9183 KOps/s | |
test_set_shared | 1.8165ms | 0.1384ms | 7.2268 KOps/s | 7.2596 KOps/s | |
test_update | 90.3480μs | 23.9673μs | 41.7235 KOps/s | 40.7637 KOps/s | |
test_update_nested | 88.1040μs | 34.1754μs | 29.2608 KOps/s | 28.2487 KOps/s | |
test_set_nested | 85.8000μs | 19.7926μs | 50.5240 KOps/s | 48.7807 KOps/s | |
test_set_nested_new | 0.1215ms | 26.8921μs | 37.1856 KOps/s | 37.5265 KOps/s | |
test_select | 0.1269ms | 51.2621μs | 19.5076 KOps/s | 19.3639 KOps/s | |
test_unbind_speed | 0.6532ms | 0.3708ms | 2.6966 KOps/s | 3.7486 KOps/s | |
test_unbind_speed_stack0 | 63.9701ms | 5.3125ms | 188.2340 Ops/s | 255.1592 Ops/s | |
test_unbind_speed_stack1 | 1.8705μs | 0.6338μs | 1.5777 MOps/s | 1.5348 MOps/s | |
test_split | 2.0059ms | 1.6426ms | 608.7770 Ops/s | 324.1649 Ops/s | |
test_chunk | 56.4894ms | 1.7453ms | 572.9645 Ops/s | 340.6940 Ops/s | |
test_creation[device0] | 0.3714ms | 0.2896ms | 3.4530 KOps/s | 3.4066 KOps/s | |
test_creation_from_tensor | 3.1503ms | 0.3259ms | 3.0688 KOps/s | 3.0103 KOps/s | |
test_add_one[memmap_tensor0] | 67.5360μs | 25.3178μs | 39.4978 KOps/s | 38.7766 KOps/s | |
test_contiguous[memmap_tensor0] | 2.7530ms | 5.8187μs | 171.8603 KOps/s | 176.8843 KOps/s | |
test_stack[memmap_tensor0] | 94.9770μs | 18.6182μs | 53.7110 KOps/s | 52.5598 KOps/s | |
test_memmaptd_index | 0.4052ms | 0.1908ms | 5.2416 KOps/s | 5.3140 KOps/s | |
test_memmaptd_index_astensor | 0.3947ms | 0.2583ms | 3.8715 KOps/s | 3.9855 KOps/s | |
test_memmaptd_index_op | 1.2280ms | 0.4988ms | 2.0047 KOps/s | 2.0104 KOps/s | |
test_reshape_pytree | 0.2863ms | 23.1407μs | 43.2139 KOps/s | 42.6054 KOps/s | |
test_reshape_td | 72.0140μs | 32.6757μs | 30.6037 KOps/s | 30.3858 KOps/s | |
test_view_pytree | 59.2710μs | 22.9230μs | 43.6244 KOps/s | 42.9537 KOps/s | |
test_view_td | 21.5000μs | 4.9208μs | 203.2199 KOps/s | 202.4707 KOps/s | |
test_unbind_pytree | 62.4160μs | 26.0819μs | 38.3408 KOps/s | 38.2756 KOps/s | |
test_unbind_td | 0.1174ms | 58.5955μs | 17.0662 KOps/s | 24.5175 KOps/s | |
test_split_pytree | 67.9170μs | 25.9602μs | 38.5204 KOps/s | 38.1016 KOps/s | |
test_split_td | 0.1084ms | 45.4651μs | 21.9949 KOps/s | 13.3940 KOps/s | |
test_add_pytree | 87.7740μs | 31.8174μs | 31.4293 KOps/s | 30.7939 KOps/s | |
test_add_td | 99.4960μs | 45.6419μs | 21.9097 KOps/s | 21.2069 KOps/s | |
test_distributed | 26.7900μs | 6.0052μs | 166.5235 KOps/s | 165.8579 KOps/s | |
test_tdmodule | 0.1009ms | 21.8140μs | 45.8420 KOps/s | 42.4383 KOps/s | |
test_tdmodule_dispatch | 0.1776ms | 40.5871μs | 24.6384 KOps/s | 25.0303 KOps/s | |
test_tdseq | 0.3550ms | 25.3614μs | 39.4300 KOps/s | 40.0340 KOps/s | |
test_tdseq_dispatch | 0.4215ms | 44.3986μs | 22.5232 KOps/s | 22.8199 KOps/s | |
test_instantiation_functorch | 1.6955ms | 1.2944ms | 772.5562 Ops/s | 772.6788 Ops/s | |
test_instantiation_td | 1.7365ms | 1.0075ms | 992.5636 Ops/s | 989.9567 Ops/s | |
test_exec_functorch | 0.2474ms | 0.1467ms | 6.8147 KOps/s | 6.7463 KOps/s | |
test_exec_td | 0.2161ms | 0.1405ms | 7.1153 KOps/s | 6.9240 KOps/s | |
test_vmap_mlp_speed[True-True] | 1.0563ms | 0.8762ms | 1.1414 KOps/s | 1.1250 KOps/s | |
test_vmap_mlp_speed[True-False] | 0.7325ms | 0.4692ms | 2.1314 KOps/s | 2.1245 KOps/s | |
test_vmap_mlp_speed[False-True] | 1.5075ms | 0.7640ms | 1.3088 KOps/s | 1.2910 KOps/s | |
test_vmap_mlp_speed[False-False] | 0.6678ms | 0.3870ms | 2.5837 KOps/s | 2.5871 KOps/s |
Incidentally, we fix a bug of unbind which makes it slightly slower than it used to be... |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Performance
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We pre-compute the batch-size to accelerate split, chunk and unbind.